Lexicalized Hidden Markov Models for Part-of-Speech Tagging

نویسندگان

  • Sang-Zoo Lee
  • Jun'ichi Tsujii
  • Hae-Chang Rim
چکیده

Since most previous works for HMM-based tagging consider only part-of-speech information in contexts, their models cannot utilize lexical information which is crucial for resolving some morphological ambiguity. In this paper we introduce uniformly lexicalized HMMs for partof-speech tagging in both English and Korean. The lexicalized models use a simpli ed back-o smoothing technique to overcome data sparseness. In experiments, lexicalized models achieve higher accuracy than non-lexicalized models and the back-o smoothing method mitigates data sparseness better than simple smoothing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HMM Specialization with Selective Lexicalization

We present a technique which complements Hidden Markov Models by incorporating some lexicalized states representing syntactically uncommon words. Our approach examines the distribution of transitions, selects the uncommon words, and makes lexicalized states for the words. We performed a part-of-speech tagging experiment on the Brown corpus to evaluate the resultant language model and discovered...

متن کامل

Improving part-of-speech tagging using lexicalized HMMs

We introduce a simple method to build Lexicalized Hidden Markov Models (L-HMMs) for improving the precision of part-of-speech tagging. This technique enriches the contextual Language Model taking into account a set of selected words empirically obtained. The evaluation was conducted with different lexicalization criteria on the Penn Treebank corpus using the TnT tagger. This lexicalization obta...

متن کامل

Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models

We tackle unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem. These HMMs, which we call anchor HMMs, assume that each tag is associated with at least one word that can have no other tag, which is a relatively benign condition for POS tagging (e.g., “the” is a word that appears only under the determiner tag). We exp...

متن کامل

A Morpheme-based Part-of-Speech Tagger for Chinese

This paper presents a morpheme-based part-of-speech tagger for Chinese. It consists of two main components, namely a morpheme segmenter to segment each word in a sentence into a sequence of morphemes, based on forward maximum matching, and a lexical tagger to label each morpheme with a proper tag indicating its position pattern in forming a word of a specific class, based on lexicalized hidden ...

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000